# Towards Prognostics of Power MOSFETs: Accelerated Aging and Precursors of Failure

Jose R. Celaya <sup>1</sup>, Abhinav Saxena <sup>1</sup>, Philip Wysocki <sup>2</sup>, Sankalita Saha <sup>3</sup>, and Kai Goebel <sup>4</sup>

<sup>1</sup> SGT Inc., NASA Ames Research Center, Intelligent Systems Division, Moffett Field, CA 94035, USA jose.r.celaya@nasa.gov
abhinav.saxena@nasa.gov

<sup>2</sup> ASRC Aerospace, NASA Ames Research Center, Moffett Field, CA 94035, USA philip.f.wisocki@nasa.gov

<sup>3</sup> MCT Inc., NASA Ames Research Center, Intelligent Systems Division, Moffett Field, CA 94035, USA sankalita.saha@nasa.gov

<sup>4</sup> NASA Ames Research Center, Intelligent Systems Division, Moffett Field, CA 94035, USA kat.goebel@nasa.gov

### **ABSTRACT**

This paper presents research results dealing with power MOSFETs (metal oxide semiconductor field effect transistor) within the prognostics and health management of electronics. Experimental results are presented for the identification of the on-resistance as a precursor to failure of devices with die-attach degradation as a failure mechanism. Devices are aged under power cycling in order to trigger die-attach damage. In situ measurements of key electrical and thermal parameters are collected throughout the aging process and further used for analysis and computation of the on-resistance parameter. Experimental results show that the devices experience die-attach damage and that the on-resistance captures the degradation process in such a way that it could be used for the development of prognostics algorithms (data-driven or physics-based).

### 1. INTRODUCTION

The failure of electronic devices is of great concern for future aircraft, which will see an increase in number of electronics systems for drive and control equipment critical to safety throughout the aircraft. This paper presents research results dealing with power semiconductor devices within the prognostics and health management of electronics. Gate controlled power transistors like power MOSFETs (metal oxide semiconductor field effect transistor) are power semiconductor devices employed in a variety of switch mode power supplies and electrical motor drivers where high frequency switching of high power signals is required.

The current research efforts for prognostics and health management of these devices focus on the identification

This is an open-access article distributed under the terms of the Creative Commons Attribution 3.0 United States License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

of failure mechanisms and the development of accelerated aging methodologies and systems to accelerate the aging process of test devices while performing in situ measurements of key electrical and thermal parameters. Accelerated aging systems allow for the understanding of the effects of failure mechanisms and the identification of leading indicators of failure which are essential in the development of physics-based degradation models and in the prediction of remaining useful life. Some failure mechanisms of power transistors are related to the packaging of the devices, particularly due to mechanical stresses caused by thermal cycling. Thermal cycling, as an aging methodology, is regularly used to accelerate the aging of the devices by cycling between temperatures considerably larger than those seen in normal operation. This is representative of the way these devices operate in real world applications and is the methodology used to assess reliability.

This works presents results on the identification of precursors of failure in regards to die-attach damage failure mechanisms. An accelerated life test system under power cycling is used to induce die-attach degradation due to thermal over stress. Power cycling as a means to accelerate die-attach degradation is a common methodology established within electronics reliability testing.

Die-attach damage as a failure mechanism is suspected to be due to mechanical shear stresses generated within the interfaces of the device assembly. The device is basically a bi-material assembly where silicon die is attached to a copper substrate using lead-free solder (die-attach); see Figure 5. It should be noted that silicon and copper have a large variation with respect to their individual linear coefficients of thermal expansion. As a result, temperature cycling of this assembly results in a series of stresses created at the interfaces of the dieattach material with the die and the substrate. Under certain conditions, these stresses develop into cracks and voids in the die-attach material which result in considerable degradation of the thermal dissipation capabilities

|                                                                                                                                              |                                                                                                                                                                                                                     |                                                                                                                        |                                                                                                            | ,                                                                                                       | ,                                                                                        |  |
|----------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|--|
|                                                                                                                                              | Report Docume                                                                                                                                                                                                       |                                                                                                                        | Form Approved<br>IB No. 0704-0188                                                                          |                                                                                                         |                                                                                          |  |
| maintaining the data needed, and c including suggestions for reducing                                                                        | lection of information is estimated to<br>ompleting and reviewing the collecti<br>this burden, to Washington Headqua<br>uld be aware that notwithstanding an<br>DMB control number.                                 | ion of information. Send comments a<br>arters Services, Directorate for Infor                                          | regarding this burden estimate of mation Operations and Reports                                            | or any other aspect of th<br>, 1215 Jefferson Davis l                                                   | is collection of information,<br>Highway, Suite 1204, Arlington                          |  |
| 1. REPORT DATE                                                                                                                               |                                                                                                                                                                                                                     | 2. REPORT TYPE                                                                                                         |                                                                                                            | 3. DATES COVE                                                                                           | RED                                                                                      |  |
| OCT 2010                                                                                                                                     |                                                                                                                                                                                                                     | N/A                                                                                                                    |                                                                                                            | -                                                                                                       |                                                                                          |  |
| 4. TITLE AND SUBTITLE                                                                                                                        |                                                                                                                                                                                                                     | l                                                                                                                      |                                                                                                            | 5a. CONTRACT                                                                                            | NUMBER                                                                                   |  |
| <b>Towards Prognosti</b>                                                                                                                     | ETs: Accelerated Ag                                                                                                                                                                                                 | ging and                                                                                                               | 5b. GRANT NUMBER                                                                                           |                                                                                                         |                                                                                          |  |
| <b>Precursors of Failu</b>                                                                                                                   |                                                                                                                                                                                                                     |                                                                                                                        |                                                                                                            |                                                                                                         |                                                                                          |  |
|                                                                                                                                              |                                                                                                                                                                                                                     |                                                                                                                        |                                                                                                            | 5c. PROGRAM ELEMENT NUMBER                                                                              |                                                                                          |  |
| 6. AUTHOR(S)                                                                                                                                 |                                                                                                                                                                                                                     |                                                                                                                        |                                                                                                            | 5d. PROJECT NUMBER                                                                                      |                                                                                          |  |
|                                                                                                                                              |                                                                                                                                                                                                                     |                                                                                                                        |                                                                                                            | 5e. TASK NUMBER                                                                                         |                                                                                          |  |
|                                                                                                                                              |                                                                                                                                                                                                                     |                                                                                                                        |                                                                                                            | 5f. WORK UNIT NUMBER                                                                                    |                                                                                          |  |
| 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)  SGT Inc., NASA Ames Research Center, Intelligent Systems D  Moffett Field, CA 94035, USA |                                                                                                                                                                                                                     |                                                                                                                        | ns Division,                                                                                               | 8. PERFORMING<br>REPORT NUMB                                                                            | GORGANIZATION<br>ER                                                                      |  |
| 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)                                                                                      |                                                                                                                                                                                                                     |                                                                                                                        |                                                                                                            | 10. SPONSOR/M                                                                                           | ONITOR'S ACRONYM(S)                                                                      |  |
|                                                                                                                                              |                                                                                                                                                                                                                     |                                                                                                                        | 11. SPONSOR/M<br>NUMBER(S)                                                                                 | ONITOR'S REPORT                                                                                         |                                                                                          |  |
| 12. DISTRIBUTION/AVAIL Approved for publ                                                                                                     | LABILITY STATEMENT<br>ic release, distributi                                                                                                                                                                        | on unlimited                                                                                                           |                                                                                                            |                                                                                                         |                                                                                          |  |
| Held in Portland, (                                                                                                                          | OTES<br>41. Annual Conferer<br>Dregon on October 1<br>nent contains color i                                                                                                                                         | 10-14, 2010. U.S. Go                                                                                                   |                                                                                                            | _                                                                                                       | • .                                                                                      |  |
| transistor) within t<br>presented for the ic<br>degradation as a fa<br>damage. In situ me<br>process and further<br>results show that the    | s research results de<br>he prognostics and le<br>dentification of the o<br>illure mechanism. D<br>easurements of key e<br>r used for analysis a<br>ne devices experience<br>is in such a way that<br>ysics-based). | health management on-resistance as a previces are aged undelectrical and thermand computation of the die-attach damage | of electronics. Execursor to failure er power cycling al parameters are the on-resistance and that the on- | sperimental re<br>e of devices w<br>in order to tre<br>e collected the<br>parameter. E<br>resistance ca | results are with die-attach rigger die-attach roughout the aging Experimental ptures the |  |
| 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF                                                                    |                                                                                                                                                                                                                     |                                                                                                                        |                                                                                                            |                                                                                                         |                                                                                          |  |
| a. REPORT                                                                                                                                    | b. ABSTRACT                                                                                                                                                                                                         | c. THIS PAGE                                                                                                           | ABSTRACT <b>SAR</b>                                                                                        | OF PAGES 10                                                                                             | RESPONSIBLE PERSON                                                                       |  |

unclassified

unclassified

unclassified

of the device —due to a reduction in the area of contact of the interfaces diminishing thermal conduction—. Consequently, the device is forced to operate at higher temperatures thereby accelerating the degradation process even further.

The drain-to-source electrical on-resistance (MOSFET is in the on state)  $R_{DS(on)}$  is a key parameter of a power MOSFET. This parameter is identified and isolated as a precursor of failure of the die-attach failure mechanism.  $R_{DS(on)}$  is highly dependent upon junction temperature  $(T_j)$  and its normalization with respect to junction temperature will provide a clear window in the development of models of the degradation process as a function of observed  $R_{DS(on)}$  values through time.

### 1.1 Related work

In the area of accelerated aging of power electronics, several approaches have been employed in terms of reliability studies. One such method involves electrical pulsing of power MOSFETs under controlled temperatures to cause electro–thermal fatigue (Khong et al., 2007, 2005). These experiments simulate stresses and hence the accelerated aging conditions typically experienced by automotive components with current levels of 120A and a duty cycle of 5–10%. The experimental results have shown that the accelerated aging leads to an increase in the drain-source on-resistance of the power MOSFET. This increase was shown to be a result of die attach de-lamination and bond-wire cracking at the source terminal.

A reliability assessment of power MOSFETs under high temperatures was performed by the authors in (Dupont et al., 2007) where the devices were power cycled with a drain current of 150A and duty cycle of 30%. Junction temperatures up to  $175^{\circ}C$  were reported in this work. It was observed that when the change in junction temperature  $(\Delta T_j)$  was high, it resulted in high drain to source leakage current, while high on-resistance due to bond wire cracking was observed when the  $\Delta T_j$ 

Thermal stress and electrical stress are the most common accelerated aging methodologies. Thermal cycling and chronic temperature overstress are prevalent thermal stress methods where the devices are subjected to rapid changes in temperature differentials causing thermal expansion and contraction. The most common mode of failure from such aging methodologies is various forms of package failure such as die solder degradation and wire lift. Experiments on MOSFETs cycled 7000 times from  $50^{\circ}\text{C}$  to  $100^{\circ}\text{C}$  showed void formation in over 30%of the die-attach (Katsis & Wyk, 2003). Similar results were demonstrated for IGBTs (insulated gate bipolar transistor) undergoing power cycling (Morozumi et al., 2003; Thebaud et al., 2003; Wu et al., 1995); these results also showed the occurrence of wire lift in the devices. Another form of thermal overstress involves subjecting devices to high temperatures for extended periods of time. This type of aging accelerates time dependent dielectric breakdown (TDDB) (Stathis et al., 2005) and transistors stressed under this methodology have exhibited temperature dependent lifetimes (Reynolds, 1974).

In the area of identifying precursors to failure, IGBTs aged with self heating have shown changes in current ringing characteristics during switching (Ginart et al.,

2008). In another study on the effects of electrical stress on power MOSFETs, high bias stress was applied at the gate and it was shown that with gate voltages ranging from 88V to 94V and the drain source grounded for 2 hours resulted in the lowering of threshold voltage and mobility reduction (Stojadinovic et al., 2005).

#### 2. ACCELERATED AGING SYSTEM

Accelerated aging —also referred to as accelerated life testing— plays a very important role on the development of Prognostics and Health Management (PHM) solutions for electronics components and systems. Accelerated life testing (ALT) and highly accelerated life testing (HALT) are methodologies frequently used to assess the reliability of products. Accelerated life testing is an essential tool in reliability, particularly for products from which their expected lifetime is in the order of thousands of hours like electronics components and systems. In such situations, it is not feasible to wait for devices to fail under normal operation in order to compute a time to failure; therefore, ALT methods are used to predict reliability (Suhir, 2007). Accelerated life testing is used in the reliability field to: a) run devices to failure and compute mean time to failure -in this case the testing is destructive—; or b) for qualifications test where the device is tested to see if it passes the test (Suhir, 2007). The development of prognostics algorithms faces the same constrains as reliability in the sense that run to failure data of critical electronics systems is rarely or never available. Furthermore, prognostics is concerned not only with time to failure of devices but with the degradation process as well. Therefore, it is necessary to include in situ measurements of key output variables and observable parameters in the accelerated aging process in order to develop and learn failure progression models.

Thermal, electrical and mechanical overstresses are regularly applied to accelerated aging of electronics components in reliability and PHM. The overstress is used in order to accelerate the aging of these devices which otherwise could take several years to fail. This acceleration is important as it allows for an assessment of the component health in a considerably reduced lifetime. Accelerated aging plays a very important role on the development of PHM solutions for electronics components and systems.

Thermal cycling and chronic temperature overstress lead to thermo-mechanical stresses in electronics due to mismatch on the coefficient of thermal expansion of the different elements in the component's packaged structure. As a result, thermal cycling is among the most prevalent accelerated aging methodologies for electronics. Thermal cycling subjects devices to rapid changes in temperature causing thermal expansions and contractions which generate high mechanical stresses in the interfaces of thermally mismatched materials. It is a regular practice in reliability testing to use an environmental chamber or a heat plate in order to provide direct thermal cycling to an electronic structure while not applying any electrical power to these devices. In our aging setup, we use indirect thermal cycling for accelerated aging of power MOSFETs.In the aging methodology used in our work, thermal gradients result from electrical power applied to the devices, nothing that no additional external heat sink is used during the aging process. This results in thermal cycles as well, but not exactly the same type of thermal cycling as found in electronics reliability literature.

### 2.1 Accelerated aging system description

The aging system used for this experiments is described in detail in (Sonnenfeld et al., 2008). This system allows for accelerated aging of gate controlled power transistors like power MOSFETs and IGBTs. The capabilities of this system include in-situ measurements, different types of stress factors that accelerate device life as well as custom made software that controls the experiment and logs the data from in-situ measurements for further analysis. A high level block diagram is presented in Figure 1; details on the hardware and software implementations are available in (Sonnenfeld et al., 2008). In terms of accelerated life testing, the system can apply different stresses like thermal, electrical or a combination of both. The focus here is on thermal cycling which is achieved by applying power cycling to the devices under test. The system allows for the investigation of different failure mechanisms (intrinsic and extrinsic) like dielectric breakdown, hot-carrier injection, electromigration, contact migration, wire lift, die-attach degradation and package delamination. This aging system was designed based on the work of Ginart et al. (2008) and it has been used in the aging of IGBTs and power MOS-FETs in order to understand failure mechanisms, identify precursors of failure and develop degradations models for prognostics and health management of these devices (Celaya et al., 2009; Patil et al., 2009; Saha et al., 2009; Sonnenfeld et al., 2008).



Figure 1: High level diagram of the accelerated aging system.

### 2.2 Aging Experiments

The accelerated aging applied to the devices presented in this work consists of thermal cycling to accelerate degradation using thermal overstress. Latch-up, thermal runaway, or failure to turn ON due to loss of gate control were the failure conditions we considered. Thermal cycles were induced by power cycling the devices without the use of an external heat sink. This greatly reduced the heat dissipation capabilities of the devices, allowing for self heating of the device during the power switching operation. The device case temperature was measured and controlled variable for the thermal cycling application. Temperatures were measured in situ using a thermocouple attached to the flange of the copper case.

To enable power cycling, the applied gate voltage was a square wave signal with an amplitude of  $^{\sim}15\text{V}$ , a frequency of 1KHz and a duty cycle of 40%. Proper amplification of this signal ensured that enough current is available to charge the gate of the MOSFET at the selected frequency and duty cycles. The drain-source was biased at 4Vdc and a resistive load of  $0.2\Omega$  was used on the collector side output of the device.

Figure 2 shows the typical response to the square wave control signal to the gate. The drain current  $I_D$  is greater than zero once the device is in the ON state (figure shows the the voltage output of the current sensor which is proportional to  $I_D$ ). As a direct result, the drain to source voltage  $V_{DS}$  drops. This plot represents the high speed in situ measurements available for further analysis. These measurements are taken only once the device is in the power cycling (switching) regime during the aging process. These are taken approximately 400mS. apart. The sampling frequency of these measurements allows for the complete observation of the pulse which has a duration of 0.4mS. There are ~3000 transient measurements available for every 35 minutes of aging. These measurements are used to compute  $R_{DS(on)}$  and could be used to compute ringing characteristics of the turn ON and turn OFF transients as well.



Figure 2: Transient response to a square control signal at the gate.

Temperatures were controlled within a low and high temperature range. The device was set to the power cycling (switching mode) regime if the case temperature falt below the lower threshold  $T_{min}$  and it was turned completely off if the temperature reached the upper threshold  $T_{max}$ . This hysteresis controller provided the thermal cycles needed to accelerate the aging of the device. This is similar to a hysteresis controller as show in Figure 3. During power cycling (switching mode), the square wave control signal described above is used to switch the device. This generates a large amount of power, resulting in rapid excessive heating of the device in the absence of a heat sink. It should be noted that proper currents and voltages were maintained within the safe operating area while the temperature was raised beyond the maximum rating to induce damage.

In addition to the transient measurements described



Figure 3: Thermal overstress aging control.

above, there are additional measurements at a lower sampling speed which are used to control the aging experiments. A snapshot of these measurements are presented in Figure 4. In this figure, only  $I_D$  and  $T_c$  are plotted along the logic state of the device (gate state). When the device is in the power cycling regime (gate state is 1), the current flows through the drain and the temperature starts to rise. On the other hand, temperature decreases when the device is not switching. These measurement are single value measurements with a sampling time of ~400mS. Their main objective is to provide monitor control variables for the experiment, provide a visual online assessment of the aging process and monitor different temperatures. In addition to  $I_D$ , other voltages like  $V_{DD}$ and  $V_{DS}$  are monitored. These are not used to compute  $R_{DS(on)}$  given that they do not have enough resolution as to make sure that the measurement was taken during the ON state.



Figure 4: Aging control measurements.

### 3. DIE-ATTACH DAMAGE AS FAILURE MECHANISM

As described earlier in section 1.1, thermal cycling is typically used for accelerated aging in the field of elec-

tronics reliability. The objective of accelerated life testing in electronics reliability is to run the devices to failure in a controlled fashion and in a period of time considerably smaller than the intended life of the device in real life operation. Thermal cycling is widely used to trigger failure mechanisms related to the packaging of the device or to stress bi-metal assemblies typical in flipchip designs and power transistors in TO-220 packages where the copper case of the packaging serves as the drain/collector pin, providing electrical and thermal dissipation capabilities. In general, a reliability study would be concerned with the time to failure under an accelerated life test, disregarding the progression of the degradation process and the progression of an incipient fault into a failure. A physics of failure reliability study would apply thermal cycling in an environmental chamber (no electrical power applied) to several devices and record their corresponding times to failure. These data are then used to fit empirical models based on physics like a power law, Boltzmann-Arrhenius, or Coffin-Manson, depending on the failure mechanism. These models are then used to predict reliability in terms of mean time to failure or other measures like time between failure or availability.

The accelerated aging methodologies used in reliability serve as a good starting point for generating run to failure data for prognostics algorithm development (data-driven of physics-based). These methodologies should be enhanced to include measurements of key variable throughout the test in order to assess the health of the system. This work makes use of the thermal cycling aging methodology but uses electrical power to heat the device. This results in thermal cycles as well and it is closer to the way these devices are used in fielded applications.

### 3.1 Thermal stresses due to thermal cycling

The power MOSFET under consideration can be regarded as a bi-metal assembly in flip chip design as shown in Figure 5. The substrate of the assembly is considered to be the copper plate of the device, the chip is the bare die and it is attached to the substrate by lead-free solder (die-attach). This is a thermally mismatched assembly due to the great difference in the linear coefficient of thermal expansion (CTE) of the materials. The CTE for copper is 16-18, for silicon is 2.6-3.3 and for lead-free solder  $20-22.9 \ ppm/^{\circ}C$ .



Figure 5: Bi-metal assembly representation of power MOSFET under thermo-mechanical loading.

This mismatch in CTE generates stresses on the solder. Consider the manufacturing process at high temperature  ${}^{\sim}100{}^{\circ}\mathrm{C}$  at which the different layers are assembled. When the device is let to cool down to room temperatures, the solder material will be in tension because it is trying to shrink and the silicon is not allowing that to happen. A similar situation arises during thermal cycling.

Suhir (1986) developed a model of the thermomechanical stresses in bi-metal assemblies. It was based on the theory of elasticity and compliance of the materials. This model identifies shearing stresses on the interfaces of the solder as well as normal stresses perpendicular to the interface. There is a high stress concentration at the ends of the assemblies resulting in the formation of cracks and voids in the solder material. This models provides the theoretical foundation for the dieattach failure mechanism as a result of thermal cycling. It demonstrates that mechanical stresses in the interfaces are a function of the temperature differentials ( $\Delta T$ ) applied through the thermal cycling process. These stresses give way to crack initiation in the die-attach which then grow as a function of continuing thermal cycling.

### 3.2 Assessment of die-attach health through thermal measurements

The die-attach damage due to thermal cycling results in a reduction of the area of contact at the solder–copper and solder–silicon interfaces. The heat transfer characteristics due to thermal conduction in the die-attach degrade given the decrease in the area of contact. Heat flow is reduced on a degraded die-attach compared with a pristine die-attach.

The steady-state model of the thermal conduction from junction to case is presented in the following equation.

$$P = \frac{T_j - T_c}{\theta_{jc}},\tag{1}$$

where  $\theta_{jc}$  is the junction-to-case thermal impedance in  ${}^{\circ}C/W$  and P is the electrical power dissipated.

As a result of die-attach damage, it is expected that  $\theta_{jc}$  increases. This will result on a higher  $T_j$  for the operation of a degraded device assuming that the power dissipation and the ambient temperature remain fixed. This is a high level approximation of heat transfer characteristics of the device. The thermal resistance can be measured experimentally with the use of specialized equipment and it provides an indication of the severity of the degradation in the die-attach.

Die-attach degradation can also be assessed by measuring heating curves of the junction temperature. The military standard MIL-750 method 3161 presents a methodology for thermal impedance measurements of vertical power MOSFETs using the delta source-drain voltage method. This methodology takes advantage of the body diode present in these devices. The voltage drop in the diode is proportional to the junction temperature and it is a very accurate way to measure  $T_j$ . The device is heated by operating it in the on state biasing the gate. The  $I_D$  is modulated with the gate voltage to provide the required power (Watts) to heat the device. The device is heated for a fixed amount of time followed by a temperature measurement. In order to make the temperature measurement, the body diode is biased while keeping the gate voltage to ground to ensure a channel is not formed.

A small current  $I_{SD}$  is used to measure the voltage drop on the body diode. This voltage is proportional to  $T_j$ .

The heating curve provides an assessment of the thermal characteristics of the die-attach. Figure 6 shows the heating curves for a pristine device and for an aged device using the aging procedure described in the following section. The heating time is one second and it can be clearly observed, that the diminished thermal dissipation capabilities of the aged devices, result in a rapid increase of the junction temperature. The steep slope starting ~10mS. is indicative of the thermal performance of the die-attach.



Figure 6: Die-attach thermal performance assessment (aged = device #11).

### 3.3 Failure analysis of aged devices

Failure analysis techniques like Scanning Acoustic Microscopy (SAM) and X-rays have been used to assess the state of the die-attach for power MOSFETs and IGBTs aged with the previously described aging system (Celaya et al., 2009; Ginart et al., 2008; Patil et al., 2009).

Figure 7 is an X-ray image of a new device. The dark area represents the die-attach solder —the silicon is transparent in an X-ray image—, the rectangular shape represents the solder below the die. It can be seen that there is solder below the die covering all the die area; therefore, the area of contact for heat transfer by conduction from the silicon to the copper is the same as the area of the die. Figure 8 is an X-ray image of device number 8 after aging. The aging procedure is described in table 1 in section 4.1. The shadows on the bottom —region represented with dotted red line—represent solder material that has migrated as a result of the thermal stresses and the increase of internal temperatures beyond the melting point of the solder. Voids are also observed below the die area. As a result, the area of contact for heat conduction has decreased resulting in a decrease in the thermal dissipation performance of the device. This result is consistent with the observed thermal dissipation performance presented in section 3.2. Similar results are obtained for the remaining aged devices used for this work.

The theory of thermo-mechanical stresses along with the failure analysis and the thermal assessment of dieattach performance indicate that the proposed aging methodology generates die-attach degradation as a failure mechanism.



Figure 7: X-ray image of new device.



Figure 8: X-ray image after aging (device #8).

## 4. DRAIN TO SOURCE ON RESISTANCE AS A PRECURSOR OF FAILURE

The relationship between  $R_{DS(on)}$  and  $T_j$  for a power MOSFET has been well documented by Baliga (2008) and it has been described in the context of accelerated aging for PHM in (Celaya et al., 2009). As described in previous sections,  $T_j$  increases as result of die-attach degradation for a fixed ambient temperature or for a fixed  $T_c$ . In a power MOSFET,  $R_{DS(on)}$  increases with temperature due to several factors like the reduction of mobility in the drift region and the inversion layer, as well as a reduction in the threshold voltage  $V_{GS(th)}$  (Baliga, 2008). As a result, the power MOSFET has a positive temperature coefficient for on-resistance which results in an increase in power losses when operated at high temperatures (Baliga, 2008).

The mobility in the drift region decreases proportional to  $T^{-2.42}$  and the mobility of the inversion layer decreases proportional to  $T^{-1}$  (Baliga, 2008). As a result,

 $R_{DS(on)}$  is expected to increase quadratically with respect to  $T_j$ . This agrees with the experimental results presented in the following section. Even though  $T_c$  is used to make the assessment, a quadratic change in the resistance is observed as the temperature increases during the aging experiments.

The selection of  $R_{DS(on)}$  as a potential precursor of failure for die-attach degradation process comes natural as a result of the solid–state physics of the power MOS-FET structure. This also applies to a packaged device like the devices under consideration in this work which have a TO-220 package. The arguments in support of this variable as precursor of failure are the following;

- destructive testing to assess health of the die-attach is not an option for a PHM application;
- even though there are ways to assess the health of the die-attach by assessing thermal performance, the required equipment might not be used for in-situ assessment;
- the on-resistance provides a window into the degradation process making a case for physics based models of the die-attach degradation process in which the state of the die-attach could be observable by response variables like R<sub>DS(on)</sub>;
- it is relatively easy to measure  $R_{DS(on)}$  in situ and there are discrete devices on the market that provide sensing capabilities for  $I_D$ .

### 4.1 Experimental Results

As described above, the aging methodology results in thermal overstress as a result of thermal cycling and internal temperatures beyond the normal operating range of the devices. In the aging system, there is no access to junction temperature  $(T_j)$  measurements. Therefore, measurements of package temperature  $(T_p)$  and case temperature  $(T_c)$  are used instead. While  $T_c$  and  $T_p$ are directly affected by variations in  $T_j$  resulting from power switching, the relationship between the two is rather complex. This is due to several contributing factors as mentioned next. The thermal impedance between the die and the case consists of several layers of different materials, i.e. die (silicon), die-attach (lead-free solder), case (copper), and the package (epoxy). This thermal impedance has capacitive and resistive components that further vary independently in all the layers. Furthermore, an accurate in situ assessment of  $T_j$  is restricted by the high switching frequency in our experiments that is much faster than the dynamics of temperature for it to get reflected on the outside. This causes a delay between measured  $R_{DS(on)}$  and the measured temperatures. The effects of such delays can be clearly seen from the measurements as shown in Figure 9. This figure shows how the temperature measured on the package (blue on upper chart) lags the changes in  $R_{DS(on)}$ , whereas the temperature measured on the copper case (green in lower chart) shows a much better correspondence to the computed  $R_{DS(on)}$ . Therefore, in the rest of the study all analysis was done using the  $T_c$ .

As a device undergoes thermal cycling, the internal structure of the device undergoes mechanical stresses at the interfaces due to different thermal expansion and contraction rates of the layer materials. Persistent cycling under such stresses is expected to create imperfections or even cracks in the die-attach layer. However, in our first aging methodologies (Celaya et al., 2009; Patil et al., 2009; Sonnenfeld et al., 2008) the devices were operated at very high temperatures, estimated to be beyond the melting point of die-attach material. In some cases, it was observed that molten material had cracked apart the package and appeared on the surface ultimately leading to device failure. In results presented here, the tests were carefully designed to avoid such situations and to age multiple devices in a controlled and repeatable manner. This meant aging at lower temperatures and following a step load temperature profile as indicated in Table 1. The temperature levels were adjusted to keep a maximum of  $T_c = 250^{\circ}$ C within the first aging cycle and successively reduce it by 10°C in further cycles until device failed. Five devices were aged under this procedure.



Figure 9: Relationship of package and case temperatures to on-resistance.

Table 1: Aging regime (all temperatures in °C)

|       | 0 0    |           |           |            |
|-------|--------|-----------|-----------|------------|
| Aging | Target | $T_{min}$ | $T_{max}$ | Aging time |
| run   | $T_c$  |           |           | (min)      |
| 1     | 250    | 249       | 250       | 35         |
| 2     | 240    | 239       | 240       | 35         |
| 3     | 230    | 229       | 230       | 35         |
| 4     | 220    | 219       | 220       | 35         |
| 5     | 210    | 209       | 210       | 240        |
| 6     | 210    | 209       | 210       | 180        |
| 7     | 210    | 209       | 210       | 180        |

Our hypothesis behind these tests was that if thermal cycling causes the die-attach degradation its effect should be reflected on the on-resistance computed through  $V_{DS}$  and  $I_D$  measurements. A limitation in establishing  $R_{DS(on)}$  as a precursor of die-attach degradation was its dependence on junction temperature, which could not be easily isolated from our experiments at this time. Therefore, we attempted to learn the  $R_{DS(on)}$  dependence on temperature using data from 5 repeated experiments conducted under similar environmental conditions and aging regimes. Figure 10 shows the relationship between the measured on-resistance and the temperature in the initial aging runs, where the device is

expected to be in a pristine and aged conditions respectively.



Figure 10:  $R_{DS(on)}$  vs.  $T_c$  for the first 4 aging runs.

This device lasted the seven aging cycles spanning over 13 hours. During the first four aging cycles we see negligible changes in the relationship between onresistance and the temperature. However as shown in Figure 11, in the next three successive tests this relationship changes drastically. We attribute these changes to the die-attach degradation, which was confirmed in the previous section. As described in previous section,  $R_{DS(on)}$  has a quadratic relationship with the temperature and it can be observed from the previous figures even though they show  $T_c$  instead of  $T_i$ . The migration of these curves as a function of aging time could be used as a mechanism for damage detection and for diagnostics of the die-attach failure mechanism. It can also be used to assess remaining life of the device, considering for example, the red curve from aging run 7 (Figure 11) as failure threshold.



Figure 11:  $R_{DS(on)}$  vs.  $T_c$  for the all aging runs.



Figure 12:  $R_{DS(on)}$  vs. aging time for all devices.

Figure 12 shows  $R_{DS(on)}$  as a function of aging time for all the devices.  $R_{DS(on)}$  is normalized based on the values measured at pristine condition. As a result, this plot shows how  $R_{DS(on)}^{\ \ \ }$  increases as aging progresses and damage grows in the die-attach region. It can also be observed that failure could be defined before the seventh aging run. For example, on device 14, there are high values of  $R_{DS(on)}$  in the neighborhood of the  $300^{th}$  minute of aging. For all those high values, the device looses gate control and cannot turn on. This is considered a failure. Further investigation is required to define the failure criteria under the die-attach failure mechanism. Qualification standards for reliability provide specifications for the amount of deviation of a device parameter when it is considered as failed. There is a risk involved in declaring the failure too late. Internal temperature keeps rising as a result of die-attach damage, eventually, the high temperature will cause other structures —like the gate oxide to fail. This will in place make it more challenging to predict remaining life if multiple failure mechanisms are involved in the process or experiment.

Figure 13 shows the increase on  $R_{DS(on)}$  for each of the 7 aging runs. The y-axis represents the increase in  $R_{DS(on)}$  after the aging run from the value at the start of the run. The results are presented in box and whisker plots to aggregate results for the 5 devices under test. For example, the sixth box summarizes the increase for the sixth aging run (described in Table 1 for the 5 samples (5 aged devices). The red line on the box represent the median and the blue lines in the box represent the first and third quartiles. They are used here to estimate location and spread of the increase on  $R_{DS(on)}$  for each of the aging runs. It should be noted that aging parameters are fixed for each aging run, in terms of aging time, load,  $V_{GS}$  and  $V_{DD}$ . Small increments can be observed from the first 4 aging runs (35 minutes runs). A big shift in the parameter is observed in the fifth run which lasted for 4 hours. In general, the increase in  $R_{DS(on)}$  is larger for aging runs of larger duration, this is expected since aging time is proportional to the number of thermal cycles. The larger the number of thermal cycles, the larger the damage to the die-attach and the higher the values of  $R_{DS(on)}$ .



Figure 13: Changes in on-resistance for each aging run.

### 5. CONCLUSION

A methodology for accelerated aging of a commercial power MOSFET (IRF520Npbf) in a TO-220 package is presented. This methodology based on thermal and power cycling triggers die-attach failure mechanism, which is a common failure mechanism for discrete devices where the chip is attached to the case of the package by lead-free solder. Experiments with X-ray imaging and thermal performance assessment of the structure corroborate the known theory that thermal cycling results in die-attach degradation for flip-chip type of structures.

In addition, on-resistance  $(R_{DS(on)})$  has been identified as a precursor of failure for the die-attach failure mechanism. Its dependence on junction temperature provides a window to the degradation process and it could be used for data-driven prognostics algorithms or for the development of physics-based models to be used for prognostics on a Bayesian update framework.

The contributions of this work are manifold. It provides a consistent aging methodology for accelerated aging of devices for PHM development. This methodology is based on power cycling which is closer to real life application than standard thermal cycling performed on an environmental chamber whith no electricity running through the device. The identification of  $R_{DS(on)}$ as a precursor of failure variable and the methodology to normalize  $R_{DS(on)}$  with respect to temperature of the device is very important. Work done in the past considered this variable as a precursor of failure but the measured values throughout the aging process were highly influenced by the environment temperature, hence making difficult the isolated assessment of the degradation.  $R_{DS(on)}$  as presented here, is proportional to the damage magnitude of the device, it could serve as a feature for fault detection and diagnosis. Furthermore, it could be used to develop remaining useful life prediction algorithms. This is possible because  $R_{DS(on)}$  is computed from in situ measurements in the aging system and values of  $R_{DS(on)}$  can be computed thought the aging process. The data obtained from these experiments and presented in this work, represent run to failure data for five devices aged under the same varying operational conditions. It contains in situ high speed measurements of key physical variables that allow for the characterization of the transient response and hence the computation of key static and dynamic parameters of the devices.

Future work in this area will consist on the development of data-driven prognostics algorithms based on this dataset. In addition, efforts to model the degradation process based on the physics are taking place. This data set will be made public in order to provide the PHM research community with run to failure data for the development of prognostics algorithms of power MOSFETs.

### **NOMENCLATURE**

| $R_{DS(on)}$  | Drain to source on-state resistance    |
|---------------|----------------------------------------|
| $T_i$         | Junction temperature (°C)              |
| $T_j$ $T_c$   | Case temperature (Copper plate) (°C)   |
| $V_{DS}$      | Drain to source voltage                |
| $V_{dd}$      | Power supply voltage applied to the    |
|               | Drain-source circuit (rail voltage)    |
| $\theta_{jc}$ | Junction to case thermal resistance in |
| · ·           | $^{\circ}\mathrm{C}/W$                 |
| $I_D$         | Drain current (A)                      |

### ACKNOWLEDGMENTS

The authors would like to express their gratitude to Dr. Antonio Ginart at Impact Technologies for participating in research discussions; Nishad Patil from CALCE at the University of Maryland for helping in the failure analysis of the devices; and Dr. Bernie Siegal from Thermal Engineering Associates for the help on the thermal analysis. This work was funded by NASA Aviation Safety Program-IVHM Project.

#### REFERENCES

- Baliga, B. (2008). Fundamentals of Power Semiconductor Devices. Springer-Verlag.
- Celaya, J. R., Patil, N., Saha, S., Wysocki, P., & Goebel, K. (2009). Towards Accelerated Aging Methodologies and Health Management of Power MOS-

- FETs (Technical Brief). In Annual Conference of the Prognostics and Health Management Society 2009. San Diego, CA..
- Dupont, L., Lefebvre, S., Bouaroudj, M., Khatir, Z., & Faugires, J. C. (2007). Failure modes on low voltage power MOSFETs under high temperature application. *Microelectronics Reliability*, 47(9-11), 1767-1772.
- Ginart, A., Roemer, M., Kalgren, P., & Goebel, K. (2008). Modeling Aging Effects of IGBTs in Power Drives by Ringing Characterization. In IEEE International Conference on Prognostics and Health Management.
- Katsis, D., & Wyk, D. (2003). Void-Induced Thermal Impedance in Power Semiconductor Modules: Some Transient Temperature Effects., 39, 1239-1246.
- Khong, B., Legros, M., Tounsi, P., Dupuy, P., Chauffleur, X., Levade, C., et al. (2007). Characterization and modelling of ageing failures on power MOSFET devices. *Microelectronics Reliability*, 47, 1735-1740.
- Khong, B., Tounsi, P., Dupuy, P., & Chauffleur, X. (2005). Innovative methodology for predictive reliability of intelligent power devices using extreme electrothermal fatigue. *Microelectronics Reliability*, 45, 1717-1722.
- Morozumi, A., Yamada, K., Miyasaka, T., Sumi, S., & Seki, Y. (2003). Reliability of Power Cycling for IGBT Power Semiconductor Modules. , 39, 665-671.
- Patil, N., Celaya, J., Das, D., Goebel, K., & Pecht, M. (2009). Precursor Parameter Identification for Insulated Gate Bipolar Transistor (IGBT) Prognostics. *Reliability, IEEE Transactions on*, 58(2), 271-276
- Reynolds, F. (1974). Thermally Accelerated Aging Of Semiconductor Components., 62.
- Saha, B., Celaya, J. R., Wysocki, P. F., & Goebel, K. F. (2009). Towards prognostics for electronics components. In *Aerospace conference*, 2009 *IEEE* (p. 1-7).
- Sonnenfeld, G., Goebel, K., & Celaya, J. R. (2008). An agile accelerated aging, characterization and scenario simulation system for gate controlled power transistors. In *AUTOTESTCON*, 2008 IEEE (p. 208-215).
- Stathis, J. H., Linder, B. P., Pey, K. L., Palumbo, F., & Tung, C. H. (2005). Dielectric Breakdown Mechanisms in Gate Oxides. *Journal of Applied Physics*, 98.
- Stojadinovic, N., Manic, I., Davidovic, V., Dankovic, D., Djoric-Veljkovic, S., Golubovic, S., et al. (2005). Effects of electrical stressing in power VDMOS-FETs. *Microelectronics Reliability*, 45, 115-122.
- Suhir, E. (1986). Stresses in Bi-Metal Thermostats. *Journal of Applied Mechanics*, *53*(3), 657-660.
- Suhir, E. (2007). How to Make a Device into a Product: Accelerated Life Testing (ALT), Its Role, Attributes, Challenges, Pitfalls, and Interaction with Qualification Tests. In E. Suhir, Y. Lee, & C. Wong (Eds.), Micro- and Opto-Electronic Materials and Structures: Physics, Mechanics, Design, Reliability, Packaging (Vol. 2). Springer US.
- Thebaud, J., Woirgard, E., Zardini, C., Azzopardi, S.,

Briat, O., & Vinassa, J. (2003). Strategy for Designing Accelerated Aging Tests to Evaluate IGBT Power Modules Lifetime in Real Operation Mode. , 26, 429-438.

Wu, W., Held, M., Jacob, P., Scacco, P., & Birolini, A. (1995). Thermal Stress Related Packaging Failure in Power IGBT Modules. In 1995 International Symposium on Power Semiconductor Devices and ICs. Yokohama.

Jose R. Celaya is a research scientist with Stinger Ghaffarian Technologies at the Prognostics Center of Excellence, NASA Ames Research Center. He received a Ph.D. degree in Decision Sciences and Engineering Systems in 2008, a M. E. degree in Operations Research and Statistics in 2008, a M. S. degree in Electrical Engineering in 2003, all from Rensselaer Polytechnic Institute, Troy New York; and a B.S. in Cybernetics Engineering in 2001 from CETYS University, Mexico.

Abhinav Saxena is a Research Scientist with SGT Inc. at the Prognostics Center of Excellence NASA Ames Research Center, Moffet Field CA. His research focus lies in developing and evaluating prognostic algorithms for engineering systems using soft computing techniques. He is a PhD in Electrical and Computer Engineering from Georgia Institute of Technology, Atlanta. He earned his B.Tech in 2001 from Indian Institute of Technology (IIT) Delhi, and Masters Degree in 2003 from Georgia Tech. Abhinav has been a GM manufacturing scholar and is also a member of IEEE, AAAI and ASME.

Philip Wysocki has extensive knowledge and background in test model based data acquisition, as well as programming and system design for diagnostics. He has developed and implemented test design for aging and characterizing IC's and environmental testing. This includes optimization of test hardware and software for prognostics. Phil earned a Bachelor of Science Degree in Computer Science along with over 25 years experience demonstrated at NASA Ames Research Center.

Sankalita Saha received her B.Tech (Bachelor of Technology) degree in Electronics and Electrical Communication Engineering from Indian Institute of Technology, Kharagpur, India in 2002 and Ph.D. degree in Electrical and Computer Engineering from University of Maryland, College Park in 2007. She is currently a Research scientist working with Mission Critical Technologies at NASA Ames Research Center, Moffett Field, CA. Her research interests are in prognostics algorithms and architectures, distributed systems, and system synthesis.

Kai Goebel received the degree of Diplom-Ingenieur from the Technische Universitt Mnchen, Germany in 1990. He received the M.S. and Ph.D. from the University of California at Berkeley in 1993 and 1996, respectively. Dr. Goebel is a senior scientist at NASA Ames Research Center where he leads the Diagnostics and Prognostics groups in the Intelligent Systems division. In addition, he directs the Prognostics Center of Excellence and he is the Associate Principal Investigator

for Prognostics of NASA's Integrated Vehicle Health Management Program. He worked at General Electric's Corporate Research Center in Niskayuna, NY from 1997 to 2006 as a senior research scientist. He has carried out applied research in the areas of artificial intelligence, soft computing, and information fusion. His research interest lies in advancing these techniques for real time monitoring, diagnostics, and prognostics. He holds eleven patents and has published more than 100 papers in the area of systems health management.